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Abstract: We consider estimation of a step function / from noisy obser- 
vations of a deconvolution </>*/, where tf> is some bounded Li-function. 
We use a penalized least squares estimator to reconstruct the signal / from 
the observations, with penalty equal to the number of jumps of the recon- 
struction. Asymptotically, it is possible to correctly estimate the number 
of jumps with probability one. Given that the number of jumps is cor- 
rectly estimated, we show that the corresponding parameter estimates of 
the jump locations and jump heights are n~ x / 2 consistent and converge 
to a joint normal distribution with covariance structure depending on tj>, 
and that this rate is minimax for bounded continuous kernels <j>. As special 
case we obtain the asymptotic distribution of the least squares estimator 
in multiphase regression and generalisations thereof. In contrast to the re- 
sults obtained for bounded <j>, we show that for kernels with a singularity 
of order 0(\x\~ a ), 1/2 < a < 1, a jump location can be estimated at a rate 
of n^ 1 /' 3-2 "), which is again the minimax rate. We find that these rate do 
not depend on the spectral information of the operator rather on its local- 
ization properties in the time domain. Finally, it turns out that adaptive 
sampling does not improve the rate of convergence, in strict contrast to the 
case of direct regression. 
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42A82, 46E22. 
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reproducing kernel Hilbert spaces, minimax rates, adaptive sampling, op- 
timal design. 



1. Introduction 



Assume we have observations from a regression model given by 

Y =(($/) (*,) + 



(1) 



where $/ = <f>*f denotes convolution of some Li-functions (f> and / and e±, £2, . . . 
are i.i.d. mean zero random variables with finite second moment. In the following 
we denote model ([1]) as inverse (deconvolution) regression model and we assume 
throughout that <j> is known. Suppose the objective function / : [0, 1] — ► M is 
in L\ and moreover locally constant, i.e. a piecewise constant function with k 
jumps given by 

fe+i 

f(x) = ^b i l [Ti _ ltTi) (x), (2) 
i=i 

s.t. — 00 = To < < Ti < . . . < Tfc < 1 < Tfc + i = 00 and k £ N possibly 
unknown (see Figure [l}. From Figure [T] the difficulty of estimating jumps in 
inverse reression becomes visible: Due to the smoothing by <p jumps only appear 
as small changes in $/. 

In this paper we show that the joint least squares estimator 9 n of jumps and 
heights 

6 = (6i,ti,6 2 ,t 2 , ■ • ■ ,b kl T kl b k+ i) (3) 

is rT 1 ! 2 consistent and follows a multivariate normal limit law. This is in strict 
contrast to the case of direct regression (where $ in (HJ is the identity). In 
the latter case it is known that the LSE converges at the (minimax) n _1 rate 
and its distribution (after recentering and rescaling with n) is given as the mini- 
mizer of a certain random walk process. Further, jump heights and loca ti ons are 



asymp t otically independet (see 



(1992); 



Muller and Stadtmiiller 



van de Geer (1988): 



19991) 



Yakir et al 



Yao and Au (1989): 



1999) 



Mullci 



Birge and Massari 
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(2006) for some references on jump estimation in direct regression). Finally, by 
an adaptive choice of the design points it is possible to speed up the n _1 rate 



to any polynomial rate of convergence (|Lan et al 



(|2007l )1. We will see that in 



inverse regression the situation is completely different w.r.t. all of these issues: 
In general, all components of n 1 / 2 (^ n — 9) will be dependent asymptotically (de- 
pending on the kernel 4>). Further, rather surprisingly, the n -1 / 2 rate does not 
depend on the decay of the Fourier transform of 4> which usually determines the 
rate of convergence in more comm on function spaces, such as Sobolev spaces 



(cf. ICavalier and Tsvbakovl (|2002f ) among others). Indeed, we will show that 
the n -1 / 2 rate is minimax if is a bounded, continuous function. Because our 
minimax lower bound will be independent of the design points we obtain the 
suprising finding that adaptive sampling cannot improve the rate of convergence 
in the inverse case. 



••• 



Fig 1 . Noisy observations of a blurred step function. The dots represent the observations and 
the black line the blurred function where 3> represents convolution with the gauss kernel. 
The gray line shows the original step function f, which is to be estimated. 



In fact a main motivation to consider the space of locally constant functions 
as in ([2]) stems from the observation that in general deconvolution is a difficult 
problem, which is reflected by rates of convergence which can be arbitrarily 
slow, e.g. (logn) -/3 rates as for supersmooth (e.g. gaussian) deconvolution (cf. 
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Butucea and Tsvbakovl (|2007h ). However, we stress that in many practical situ- 



atio ns, gaussian de c onvol ution is still applied, leading to satisfactory results (see 



e-g 



Bissantz et al 



(2007) for an example in astrophysics). At a first glance this 
seems to be contradictory. However, often a minimax result leads to rather pes- 
simistic view, in particular in large function classes such as Sobolev spaces are. 
Often, more restrictive modeling is possible and necessary to obtain reasonably 
good rates of convergence. In fact the space of locally constant functions as con- 
sidered in this paper (albeit of dimension oo) yields a n -1 / 2 rate of convergence 
generically which renders deconvolution in this setting as a practically feasable 
task. In fact, in this case the correct (and finite) number of jumps will be es- 
timated asymptotically, and the problem reduces to a (nonsmooth) nonlinear 
regression problem. 

We will give general conditions, which are sufficient to deduce the n _1//2 rate. 
These conditions are borrowed from the theory of radial basis functions in native 
Hilbert spaces and from total positivity. They cover super-smooth functions such 
as the Gauss-kernel, polynomial kernels <j){x) — x v l[o,i) {%) with p = 0, 1, . . . and 
continuous symmetric functions (j) which have a Fourier transform with an at 
most polynomial decay, satisfying C(l -I- |a;|™ ) -1 for some no £ N, C > 0. 

If the number of jumps is unknown, we show that - under the additional 
assumption of subgaussian tails of the error distribution - the number of jumps 
can be asymptotically estimated correctly with probability one. 

We mention that our results can also be shown for more general Fredholm 
integral operators of the type <&/ = J K(x,y)f(y)dy with continuous kernel 



K : [0, 1] 



(see 



Boysen, 



2006). For reasons of simplicity and ease of 



notation we do not treat this case here. 

A classical model which fits into our framework was given by 



Quandtl (jl958l ) 



He introduced a linear regression model which obeys two separate regimes and 
where the change-point is not known. This mo del is call e d two - phase re g ressio n 



and inference in this setting was studied by 



Quandtl (| 19601 ). ISprentl (| 19611 ), 
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Hinklevl (119691) and m ore recently by Ivan de Geerl <| 19881 ) , 



andlKoul et al 



Yakiretal 



il999|) 



(|2003r ). among others. If the objective function / is assumed to 
be continuous, two-phase regression can be modeled by an inverse regression 
model with a polynomial kernel with p — 0, i.e. 4>{x) = l[n -n(z) . In this s etting 
the n -1 / 2 rate and the asymptotic distribution were derive d by Hinklevl ( 19691 ) 



Feder 



19751 ) . From 



and - for more general segmented regression models - by 
the perspective of a statistical inverse problem their results are quite natural to 
understand: multiphase regression corresponds to estimation of a jump function 
in a noisy Volterra equation where the location of jumps correspond to the kinks 
of the multiphase regression function. 

Our results generalize the known results on the estimation of the intersection 
in two phase regression to the case where the objective function has an arbitrary 
number of phases and is piecewise polynomial of order p + 1, with p continuous 
derivatives and a (p + l)-th derivative, which is a step function. For piecewise 
linear regression (p = 1) in a deconvolution context this problem occurs in rheol- 
ogy where the relaxation time spectrum has to be est i mated from measurements 
of the dynamic moduli of materials (cf. 



Roths et al 



20001 ). Other applications 



stem from biophysics, where the ion-channel activity of lipid membranes are 
measured by impedan ce spectroscopy and the jump locations indicate differ- 



ent opening states (cf. 



Schmitt et al 



2006 



Romer et al 



20041 ). We obtain the 



somewhat suprising result that the rate of estimating the change-point does not 
depend on p, whereas in general nonparametric regression settings, the conver- 
gence rate s for estimating a jump in the p-th derivative become slower as p 



grows (see 



Raimondo 



19981 ). 



The first one to investigate the 



a statistical inverse problem was 



change-p o int pr oblem in the framework of 



Neumann! (|1997T ). who considered the esti- 



mation of a change-point in a density deconvolution model Y = X + £ with 
known error density fe. He treated the case that the density of X is bounded, 
has one jump at r and is Lipschitz continuous elsewhere. In this setting r can 
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be estimated at a rate of min(n _1 ^ 2 ' 3+1 - ) , n _1 /^ +3 / 2 ^), provided the tails of 
the Fourier transform fe(x) decrease at a rate of Moreover, he proved 

th at these rates are op t imal in a minimax sense. This result was extended 



by 



Goldenshluger et al 



(|2006d) (in a white noise model) to classes of func- 
tions / which can be written as a sum of a step function and a function with 
smooth rn-th derivative. They showed that in this case the minimax rates are 
of order min^-VC 2 ^ 1 ), n -(m+i)/(2/3+2m+i))_ If the smoot h part of the func- 
tion of interest belongs to a Paley- Wiener class, they show that a rate of 
minfn" 1 / 2 , rt~ 1 /( 2 ^+ 1 )) can be obtaine d up to a logarithmic factor. Their re- 



cent work (j Goldenshluger et al 



, 2006al lb 



bj) generalize these results to a unifying 



framework of sequence space models covering delay and amplitude estimation, 
estimation of change-points in derivatives and change point estimation in a con- 
volution white noise model. We remark that the specific choice of jump functions 
in ([2|) used in this work comes close to the super-smooth case for f3 > 1/2, but 
we can get rid of the additional logarithmic factor. Moreover, we will see that 
similar rates hold in the case of (3 < 1/2 if the assumption on the boundedness 
of the kernel is dropped (see Remark [3]). 

This work is structured as follows. Section [2] gives some basic notation and 
the main assumptions. The estimate and its asymptotic properties are given in 
section |3] and the proof of the main result can be found in section |H In section [5] 
we derive the required results from the theory of radial basis functions whicdh 
yields sufficient conditions on <p for the asymptotic normality of the LSE. Finally, 
in section [6] we derive the minimax rate for estimating the jump location. 

2. Model assumptions and Notation 
2.1. Notation 

Define 

Tfe := {(7o,7i, ■ ■ ■ ,7fc+i) : -00 = 7o<0<7i<...<7 fc <l< 7 fe+ i = 00} 
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as the set of possible jumps of / in |T]), and denote the corresponding function 

space of locally constant functions with at most k jumps by 

fe+i 

i=l 

Write Too := UfeLi ^fc f° r the set of all step functions on M with a finite but 
arbitrary number of jumps, where we exclude an isolated jump at the end points 
of the interval [0, 1]. Note, that outside of [0, 1] these functions are constant. Let 
TkM = {g € T k : \\g\\oo < R} as well as T^^ := Ufeli T kM the corresponding 
spaces of uniformly bounded functions for some R > 0. If not mentioned oth- 
erwise, the restriction of these spaces to [0, 1] are considered to be subspaces of 
£a([0,l]). 

Define the empirical norm || • || n and the empirical inner product (•, -) n by 

^ n 1 71 

ll-^ll™ := nY.^^ as well as (f,g) n := - f( Xi )g(xi), 

i=l i=l 

where x \ » • • • • are the design points. Similarly set 

\\y\\l : = -y^yf as well as (y, z) n := - V 

i—l i—1 

for y, z £ R". 

Write g{t+) '■= lim K \^t g{x) for the right limit of g in t and g{t-) := ]im x / [ t g(x) 
for the corresponding left limit. For some proper function g : R — > R define the 
set of jump points of g as 

Jb):={te [0,1] :g(t-)*9(t+)} (4) 

and </#(/) ■= #J~(f) + 1, where denotes the number of jumps, which 

may be infinite. 

Define the distance of some point agRto the set B C R as 

d(a,B) = inf la -61 

beB 

and, slightly abusing notation, the Hausdorff distance of two sets A, B as 
d(A, B) = max{sup d(a, B) , sup d(6, A)} . 

aeA beB 
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Finally, for ease of notation for any a, b € K, [a, b] and (a, 6) always denote the 
intervals [min(a, b), max(a, b)] and (min(a, b), max(a, 6)), respectively. 

2.2. Assumptions 

Assumptions on the error If the number of jumps is known the following 
basic assumption is sufficient to deduce the rT 1 ! 2 rates of convergence for the 
least squares estimates. 

Assumption A. The array (e%, . . . ,e n ) consists of independent identically dis- 
tributed random variables with mean zero for every n. Additionally, assume 

E( £ 2) = cr 2 < oo. 

If the number of jumps of the objective function is unknown, we will addi- 
tionally need that the error satisfies the following subgaussian condition. 

(Al) There exists some a > such that F 1 (exp(el/a)) < oo. 

Assumptions on the kernel The parameters of / in (fTJ) and ^ arc identi- 
fiable if 

feT k<r and 0= ||(*/)(-)|| 2 / = 0. (5) 

By 

fc+i 

ll^/JOL^lE^lln-i.roJOl 

i=l 

relation |5]) holds if the functions 

(*W))(0. (*l[r 1 ,r 2 ))(-), • • • , (Slh^oKO (6) 

are linearly independent in L2QO, 1]) for any (to, . . . , Tfc+j.) G IV 

Throughout the following we require a slightly stronger condition, the inde- 
pendence of the functions in ([6]) together with their derivatives. 
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Assumption B. Let 

j <t>(% - y)dy b^a, 
X.U.n.b) := { "' (7) 

(x — a) b = a . 

Assume that <fi € L±(M) D ^(K) PI Loo(M) is piecewise continuous with finitely 
many jumps. Additionally the functions 

A^(x,ro,n) , A^(ar,ri,T2) , ... , A cf> (x,T k ,T k+1 ) 

are linearly independent for every choice of k € N and 

-oo = t < < Tl < r 2 < . . . < r fe < 1 < T k+ l = oo , 

where only two subsequent T{ are allowed to be equal. 

The following theorem gives some general conditions, which are sufficient for 
<fi to satisfy Assumption [Bl 

Theorem 2.1. The function (j> satisfies Assumption^ if one of the following 
conditions is satisfied. 

(i) <j) G C(R) nLi(lR) is a symmetric real-valued function with Fourier trans- 
form 4>(x) > 0, such that there exists n$ G N and C > with 



C(l + |arr o ) _1 < |?(x)| /oroliiet. 



(8) 



(nj is extended sign regular of order k + 2 on 1, ure£/i < J <f>{x)dx < oo. 
fmj The function <f> is given by 



4>{x) 



x p xe [0, 1] 



else 



P e {0,1,2,...}. 



The proof of part (i) is given in sect ion El the pro ofs of part (ii) and (iii) 



are straightforward and can be found in 



Bovsenl (|2006l) . S ection 5.2 and 5.3. A 



definition of sign-regularity can, for example, be found in 



Karlin and Studden 
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(|1966l ). Note that part (ii) co vers the Gauss kerne l 4>{x) = (2n) 1 I 2 exp(— x 2 /2) 



(see Section 3, Example 5 in 



Karlin and Studden 



1966). 



Examples of kernels which satisfy condition (i) are the Laplace kernel cf>(x) = 
cxp(— |x|)/2, the kernel tfi(x) — cos(x) exp(— |a;|) and kernels of the type (f>(x) = 
(1 — for p — 2, 3, . . . where x+ denotes the positive part of x. Moreover, 

the convolution of any two kernels 4>\ , <f>2 satisfying (ij|(iii) clearly also satisfies 
this condition. 



Assumptions on the design points We make the following assumption on 
the design points. 



Assumption C. There exists a function h : [0, 1] 
oo and f Q h(x)dx = 1, such that 



[ci, c u ] with < q < c u < 



i 

n 



h{x)dx + Si 



for all i = 1, . . . , n, with naax< = i n \Si\ = OpipT 1 !" 1 ). 

Moreover, the design points x\, . . . ,x n are independent of the error terms 
E\, . . . , e n . Here Xu\ denotes the i-th order statistic of x\, . . . , x n . 

Note that the above assumption covers ran dom designs as well as fixed designs 



generated by a regular density in the sense of iSacks and Ylvisakerl (|1970f ). If the 
design points xi,...,x n are nonrandom, the Op(n -1 / 2 ) term above is to be 
understood as Q{n~ 1 / 2 ). In this case the design p oints have to be understood 



Diimbgen and Johns! (|2004l ) use a similar assumption on 



as a triangular scheme, 
the design points. 

Note that if we assume a triangular scheme and fixed design points instead, 
all results can be obtained essentially in the same way. The only argument which 
has to be slightly modified, is the one based on the law of the iterated logarithm 
in the proof of Lemma 14.131 Note that, the respective inequalities remain valid 
because error terms in a triangular scheme can be replaced distributionally 
equivalent by a sequence of i.i.d. random variables. 
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Estimate Define the restricted least squares estimate /„ as approximate min- 
imizer of the empirical Li distance to the data in the space More precisely 
fn € T k>R and 

\\*fn-Y\\ n < min (\\$g - Y \\ 2 n ) + 0^) . (9) 

The minimizer of the functional on the right hand side always exists (compare 
Lemma 14. 6|) . Note that we do not assume that the minimum is attained, but 
only that the functional above can be minimized up to some term of order 
o p (n ). It does not need to be unique. This assumption allows for numerical 
approximation of the minimizer and gives an intuition of the needed precision for 
the asymptotic results to be valid. The restriction to functions with ||/||oo < is 
a technical assumption, which requires that some upper bound of the supremum 
norm of the objective function is known beforehand. 
Note that any estimator /„ has a representation as 

fc+i 

f n ( X ) = J2hl[t i . u -r i )(x), (10) 
i=l 

with vectors b = (b\, . . . , 6fc+i)' and f = (fo, . . . , ffc+i)*, which are the approxi- 
mate least squares estimates (in the sense of ((9|) of the true parameter vectors 
b and r given by equation @. 

If the number of jumps is unknown, a different estimate is needed. In this 
case, assume that the penalized least squares estimate f\ n satisfies f\ n G T^ii 
and is defined as any solution of 

\\$h n -Y\\ n + \ n Mh n )< min (\\$g - Y\\ 2 n + A„ J # (gj) + Op^ 1 ) , (11) 
where A„ > is some smoothing parameter, s.t. A„ — * as n — > oo. 
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Asymptotic results Before we state the main result, we first define the map 
v : [0, 1] >-> R 2k+1 by 

/ A cf> (x,T ,T 1 ) \ 

(bi - & 2 )A^,(x,ti,ti) 

A (Z, 71,-72) 

z/(x) = , (12) 

(b k - 6 fe+ i)A0(x,T fc ,r fc ) 
\_ A (x,T fc ,r fe+ i)) J 

and the (2fc + 1) x (2k + 1) matrix V by its entries 

(V)ij= f {v{x)v{x) t ) lJ h{x)dx. (13) 
Jo 

Here /i is the design density given by Assumption [C] Now we are able to formu- 
late the asymptotic result for the least squares estimator. 

Theorem 3.1. Suppose the Assumptions^^ [B| <wid[0 are met. Let f n and V be 
given by and H13\) , respectively. Set 8 as the parameter vector of f given in 
0), and 9 n as the corresponding vector of estimates defined by \10\). Given |]|) 
and model (QP, then 

(i) y/K0 n -B) N{0,a 2 V- v ). 

Moreover, 

(ii) ||$/-$/„|| 2 = Op(n- 1 / 2 ). 
(Hi) d(^(/), t 7(/„)) = Op(n- 1 / 2 ). 
H \\f-fnh = P (n- 1 ^). 
(v) V is positive definite. 

The following theorem implies that the penalized and the restricted least 
squares estimates asymptotically coincide, i.e. the number of jumps in Too is 
asymptotically correctly estimated with probability one. In this sense the the 
results of Theorem 13.11 can be applied to the penalized estimate f\ n . 
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Theorem 3.2. Suppose condition \ (A1)\ 111]) and the assumptions of Theo- 
rem \3.1\ are satisfied. If \ n — > and A n n 1 /^ 1+e ^ ) — * oo for some e > as n — ► oo. 
then 

lim P(#J(f Xn ) = #J(f)) = l. 

n — >oc 

The proofs of Theorem 13.11 and 13.21 can be outlined as follows. For a known 
number of jumps an entropy argument yields consistency of the least squares 
estimator. It is possible to represent the estimator as the minimizer of a stochas- 
tic process, which allows for a local stochastic expansion. This can be used to 
derive asymptotic normality. If the number of jumps is unknown, an imitation 
of techniques from empirical process theory shows that for a suitable choice of 
the smoothing parameter the case of an unknown number of jumps can asymp- 
totically be reduced to the case where this number is known. 
The details of the proofs are given in several steps in section [U 
The next theorem states that the rate given above is optimal in a minimax 
sense. 

Theorem 3.3. Suppose the Assumption^ is met and £i, . . . , e n are indepen- 
dent identically distributed normal random variables with zero mean and positive 
variance. Set 

k+l 

9 = j# = (6 1 ,ti,6 2 ,t 2 , . . . ,b k ,Tk,b k +i) ■■ fe(-) := ^2 b l ly rz _ uTi) (-) e T k;R \ . 

»=i 

For arbitrary fixed design points x%, . . . , x n € [0, 1] denote by P@ the probability 
measure associated with the observations 

Yi = ($f g )(xi) + £i i = l,...,n. 

Then there exists some Cq > independent of n and X\, . . . ,x n such that 

infsupP^II^-fllh > con" 1 / 2 ) >0. 
e eee 

The proof is given in section [6l 
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Remarks and Extensions 

Remark 1. (Adaptive sampling) . Theorem \3.3\ states for any fixed bounded 
kernel <f> and any choice of design points, that faster rates of convergence as 
nT 1 !" 1 are not possible. This is intuitively clear as the convolution "spreads" 
the information of the jump location over the whole interval. As a consequence 
adaptive sampling schemes (where the sampling point Xi may dependent on the 
data Y%, • • • , Yi-%) cannot lead to a faster rate of convergence as n~ x / 2 . This is 
in strict contrast to the case of direct regression (& = Id) w here any po l ynom ial 



rate of convergence can be achieved by an adaptice scheme \Lan et 



200' 



Remark 2. (Noisy Fredholm equations). All results of this chapter can also be 
shown for more general integral operators of the type <£>/ = J K(x, y)f(y)dy with 
continuous kernel K : [0, 1] x I -> I satisfying sup^gQ 1 \\K(x, < °°- I 11 

this case in definition cf>(x — y) has to be replaced by K(x, y). AssumptionW\ 
can be formulated in the same way. 

Remark 3. (Singular kernels). If the assumption of the boundedness of the 
integral kernel is dropped, faster rates than Op(n -1 / 2 ) for estimating the jump 
location can be achieved. Indeed if(j) is an Abel type kernel 4> a (x) = x~ a l(o i00 ) (x) 
for a G (0, 1) then a jump can be recovered at a rate of Qpfn " 1 ^" 1 " 1 ^ 2 , 3 " 2 ^ 



Bouseri \200tt ) 



Given a uniform design, these rates are minimax. For details seel 
chapter 8.2. We mention that in this case adaptive sampling can improve the 
rate of convergence similar as in the direct case but in contrast to a bounded 
kernel (cf. again RemarkUty. Interestingly, the n _1 sampling rate is achieved as 
a — > 1, which is well known to be the best possible rate in direct regression for 
the estimation of a jump. Hence, singular kernels, with a spike at least as strong 
as l^l" 1 already localize jumps with the same rate as for the direct case, which 
is achieved as a — > oo. 

Note that the "elbow" in the rates of convergence occurs at a — 1/2, and 
that the nT 1 !" 1 rate holds for the case where (f> a is square integrable on bounded 
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intervals. 

This corresponds to findings of 



15 



Neumann 



'1991) and 



Goldenshluaer et al 



'2006c ), who also observe an elbow in the rate of convergence of recovering 



a change point in an inverse prob lem at = 1/2, if the Fou rier transform 



of 4>{x) decreases at rate of \x\ 



Goldenshluaer et al. 



(2006c) give a rate of 



Op(n !/ mln ( 2 . 2 / 3 + 1 )) U p t a logarithmic term if the smooth part of the function 
of interest is in a Paley- Wiener class. From 



\<t> a (x)\ = \x\- l+a T(l - a) 

it follows that the "elbow" for = 1/2 can be identified with the elbow for 
a = 1/2. 



4. Proof of Theorem I37T1 and [3721 

We start with some technical lemmata, give some entropy results on the spaces 
of interest which are required to apply tools of empirical process theory to prove 
consistency of the estimates. Afterwards we give a local stochastic expansion of 
the minimized process and use this to derive asymptotic normality. Finally we 
again imitate some techniques from empirical process theory to show that the 
penalized estimate asymptotically coincides with the restricted least squares 
estimate. Note that Assumption [B] is needed to assure identifiability as well as 
positive definiteness of the asymptotic covariance matrix V. 



4-1. Some technical lemmata 

In order to gain some insight into the model, it is useful to have a closer look at 
the implications of Assumption [B] on the mapping $ restricted to the space of 
step functions. The following lemma collects some properties of this mapping. 

Lemma 4.1. Given Assumption^ the following holds true. 
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(i) For all e > there exists < Cq < oo such that for all f S Too 

ll$/ll^<Co||/||l 2([ _ £il+£]) . 

(ii) For all e > the map <f> : ( T^, \\ ■ ||£ 2 (r_ ej i_|_ e n ) — > -/^([O, 1]) is continuous. 
(Hi) $ : Tfc — > L2([0, 1]) is one-to-one. 

(iv) The function (<£>/) is Lipschitz continuous on R /or a/Z / G T^. 

Proof. By Assumption [B] we have that ||</>||oo = C < oo. Hence 
r 1 - 

<C 2 [ f 2 (y)dy+ [ f(y)-f2<p 2 (xi-y)dy, 
Jq JR\[a.i] n ~l 

for / € Too. Note that / is constant on (— oo, 0) and [l,oo). This gives 

ll^/ll, 2 , < C 2 \\f\\ 2 2 + ||0|| 2 L2(R) (/|(-oo,O)) 2 + IM| 2 L 2(R )(/ l[l,oo)) 2 
= C 2 \\f\\ 2 + ( f f ( y )dy + j' +e f (y)d yj 

< C o\\f\\ L2 ([-e,l+e]) 1 



for some Co depending on <fi and e only. This proves (i) 

Similarly we can show ||$/||2 < ^1l/lli,2([-e i+el) for / £ Tf. which gives 



continuity and hence (ii) As argued in the part on the assumptions on the 
kernel in section \2.2\ (iii) follows from the independence of A^(-, Tj, tj+i). 



To prove (iv) note that 

\($l [atb) )(x) - ($l [a<b) )(x + S)\ 



<P(y)dy - 



x — b 



x+8 — a 



x+5-b 



0(y)dy 



< 2 



for any i,JeM and a, b e K U {-oo, oo}. For f E with #J(f) < oo, this 
gives !($/)(*) - ($/)(* + 5)| < |5|(2# l 7(/)||/||oo|kl|oc). □ 



The following lemma provides a link of the empirical and the L2 norm. 
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Lemma 4.2. Suppose Assumption^ is satisfied and f is piecewise Lipschitz 
continuous on [0, 1], i.e. there exist a partition Ji, . . . k < oo, with |Jj =1 Ik — 
[0, 1] and Ij n I r = for j ^ r such that f\i j is Lipschitz for all j = 1, . . . , k. 



The 



r 1 1 " 

/ fix^dx^-Y^fix^ + Opin- 1 / 2 ) 



If additionally Assumption^ is met 



l*/ll3 = <MH3/lln) 



Proof. The proof is straightforward. For details see 



Bovsenl (j2006l ) Lemma 7.2. 

□ 



4-2. Entropy results 

To show consistency of the estimates, we wish to apply results from empirical 
process theory. To this end, let us first introduce some additional notation (cf. 



van de Geer 



2000) 



Given a measure Q, a set of Q-measurable functions Q and a real number 
8 > 0, define the 5-covering number N(S, G, Q) as the smallest value of N for 
which there exist functions g\,...,gN such that for every g S Q there is a 
j G 1, ...N with 

[J{g-g 3 ) 2 dQ) 1/2 <S. 
Moreover, define the <5-entropy H of Q as 

H(5,g,Q) = lagN(6,g,Q). 

If Q is the Lebesgue measure we will write H(S,G) and N(S,Q) instead of 
H(S,Q,Q) and N(5,G,Q). Given design points x%,...,x n € R, the empirical 
measure will be denoted by Q n = n^ 1 ^™=i Note that || • ||„ is the norm 
corresponding to the space L 2 {^ 1 Q n )- 
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Finally, define the entropy integral 

J(8, G, Q) := max (b , J # 1/2 (u, G, Q)du^j . 

Note that for our purposes, the relevant quantity is the entropy of the space 
Gk.R = {$/ : / € Tk.n}. However, it is convenient to first calculate the entropy 
of (Tk t R, || • ||i 2 ([ a ,b])) and then use Lemma |4~T1 to infer on the space Gk,R- 

Lemma 4.3. For — oo < a < b < oo there exists a constant C > independent 
of 8,k and n, such that 



H(S, (T k , R , || • || MM] ))) < C(k + !)(! + log (^±11 
Proof. Define the sets 

A (<5) = { - R + mc 2 6 : m = 0, . . . , \2R{c 2 S)-^ } 



)) 



and 



\8) = | a + mcifS 2 :m=l,...,[(b- a)^^ 2 )" 1 ] } 



where ci, C2 will be defined later. Define the function class Tt(5) as 



fc+i 



H(6) = {g : g(x) = ^ 6il [7j _i )7i )(x) : h £ A$(6),i = 1, . . . ,k + 1, 

i=l 

70 = a,7 fe +i = &,7i G r(<5),7i < 7i+i,i = 1, . . . , fc} . 

Now for go € Tfc,-R we can choose .g e H(<5) such that d(J(g), J(g )) < ci<5 2 /2, 
and that for any x G [a, b] with d(x,J'(g)) > c\8 2 /2 we have (<7o0e) — 5( a: )) 2 < 
C2<5 2 /4. Since 50 has k jumps between a and b we get 

lift - sll! 2 (M]) < (& - a)c£ j + k(2R) 2 Cl 5 - . 

Choosing c\ = (AkR 2 )^ 1 and c 2 = (b — a)~ 1 / 2 gives ||g — 3II2 < 8. Hence H(S) 
is an (5-covering of (T k<R , \\ ■ \\ L2 ([a,b]))- Since 

{b - a)4kR 2 ~\ k „//i?(fc + l)\ 3fc+lN 



fc+i 



o 



the claim is proved. 



□ 



imsart-generic ver. 2008/01/24 file: ejs_2008_204.tex date: March 14, 2008 



L. Boysen and A. Munk/ Jumps in inverse regression 19 

Lemma 14 . 31 directly gives that (T k ,R, \\ ■ || L 2 ([a,t>])) is totally bounded for — oo < 
a < b < oo. Note that {T k ,R, \\ ■ ||i 2 {[ a ,6])) a l so contains functions with less than 
k jumps and hence is closed. Consequently, it is compact. 

Corollary 4.4. The space (T kt R, \\ ■ \\L 2 ([a,b])) * s compact for all a, 6 satisfying 
— oo < a < b < oo. 

We will now use the assumptions on the operator $ or, to be more precise, 
Lemma |4~T| to deduce bounds on the entropy of the space 

g k , R ($) : {<!>;/:.'/• /,,,,■) . 

Corollary 4.5. Assume $ satisfies Assumption\B[ There exists a constant Ci 
independent of n,k and R such that 

H(S, g k , H ($), Q n ) < C 2 (k + !)(! + log ( fl ( fc + 1 ) )) . 



Proof. By Lemma T4.ll (i) there exist — oo < a < b < oo and < Cq < oo such 
that 

||*/-*fl||n<C ||/-«?|| £a([o ,6]) 

for f,g£ T k . Assume H(8) is a (5-covering of (T k>R , \\ ■ ||L 2 ([ Q ,b])) for every S > 0. 
Then TL(8/Cq) is a ^-covering of Qk{R)- Consequently, the claim follows from 
Lemma l4~3l □ 



Again, this implies that the space Qk,R{^) equipped with the empirical norm 
|| • ||„ is compact. Consequently the functional || • — Y\\ n has a minimizer in 
Gk.R^) for every k. As A„J # (-) is strictly increasing in the number of jumps 
for every A n > this implies the following lemma. 

Lemma 4.6. For each X n > the functional \\-—Y\\ n + X n J #(■) has a minimizer 
in Goo,r(®)- 
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4-3. Consistency 

To deduce consistency of the jump estimates from the Li consistency of the 
function estimator, a result on the dependency of d(J(f) : J(g)) on the L2 
distance of / and g is needed. This is given by the following lemma. 

Lemma 4.7. Assume f, g G T^. Then 

4||/ -.9111 



d(J(f),J(g))< 



(mm{\f(t + )-f(t-)\:t£j(fW 
Proof. Let t G J(f) and 7 G J(g), such that |t - -y| = d(J(f), J(g))- Then 

'mm{\f(t + )-f(t_)\:teJ(f)W 



\\f-9U>\r-7\ 

which proves the assertion. □ 

In order to show consistency of f n , we first prove the consistency of $/ n - To 
this end we require the following resu l t whic h follows directly from the proof of 



Theorem 4.8, page 56 in 



van de Geerl pOOOl ) 



Lemma 4.8. Assume ex, . . . , e n are i.i.d. with mean zero and E(e|) — a 2 < 00. 
Set Gn(R) = {g G G ■ I Iff 1 1 n < -R} a^rf suppose that 

- H(S,gjR),Q n ) -»0 /or a// <5>0,i?>0. 

n 

TTien 

1 " 

sup |<e,ff) n | = sup |-Ve 2 g(xO| = op(l) 
geff„(fl) see„(fl) n i=1 

for every R > 0. 

Now we are able to prove consistency of f n . 

Lemma 4.9. Suppose the Assumptions\^\ \B\ and\Q are met. Then is con- 
tinuous as mapping from {$/ : / G Tfc.p} C ^([0, 1]) to the space (Tk,R, \\ ■ 
||i 2 ([_ ej l_l_ e ])) f or an U fc G N, i? > 0. Moreover ||$/ — <£/ n ||2 = °p(l) and conse- 
quently 

\\f-fnh=Op(l). (14) 
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Proof. Use © and Y = $/ + e to obtain 

||*/»-#/||,, < 2<<&(/„ -/),£„>„ + o(n" 1 ) 

< 2 sup |(g,e„)„| + o(n _1 ) , 



since / - f n G T 2 fc,2fl- By Corollary g~5] 

» _1 JZ'(<J,e2k,2fl(*),Q»)->0 for all <5 > . 
Hence Lemma T4.8I gives 

sup \(g,e n ) n \ = o P (l) . 
This proves ||$/ — &f n \\n = op(l). Application of Lemma [4721 yields 

||3/-$/n||a=op(l). (15) 
Note that $ is a linear operator and / — / € T 2 k,2R- By Corollary 14.41 the 



space (T2k,2R, || ■ ||i 2 ([_ e ,i+e])) is compact for each e > 0. Lemma |4.1[ (iii) and (ii) 
yield that there exists an e > such that the map 

*:(T afc>2 fl,||.|| ia([ _ eil+e]) )->L a ([0,l]) 

is continuous and one-to-one. 

The inverse of a continuous injective mapping / restricted to the image /(f2) 
is continuous if fl is compact. This gives continuity of as mapping from 
{*/ : / G T 2ki2R } C L 2 ([0,1]) to (T 2kt2R ,\\ ■ || £a([ _ M+e]) ). Hence, ||$/|| 2 -> 
implies ||/|| L2([ _ e4+e]) = ||$ _1 $/||i, 2 ([_e,i+e]) -> for / e Ik,^. Consequently 
l|15p implies 

||/-/||a<||/-/IU a ([-. ) i+e]) = op(l). □ 

This allows us to infer the consistency of the parameter estimates. The fol- 
lowing corollary is a direct consequence of Lemma 14.71 and 14.91 
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Corollary 4.10. Suppose the prerequisites of Lemma \4.9\ are met. In this case 

d(J(f),J(f n )) = o P (l), 

as well as ft J if) = ffj^fn)- Moreover, if f is given by fH) and f n by we 
have for the estimates bi of the levels bi that 



max \bi - bi\ = oJ\) 



4-4- Asymptotic normality 

To show asymptotic normality for M- estimators, it is common to assume ex- 
istence of the derivative of the function which is minimized. However, as </> is 
allowed to have discontinuities, a less restrictive result is needed. 
As discussed in Chapter 5.3 oft 



van der Vaart 



(|1998r ) it is sufficient to assume 



existence of a second order Taylor-type expansion. Following this idea, the next 
theorem gives the asymptotic normality of the minimizer of a process Z n (8) 1 
provided it al l ows fo r a certain expansion. It is similar to Theorem 5.23 of 



van der Vaartl (|1998l ). but also covers the case of non i.i.d. random variables, 
which is required for the fixed design. 

Theorem 4.11. Assume C M. d is open and 8o S 9. Let (Z n (8))e£e be a 
stochastic process. Assume there exists a sequence of random variables (W / n )neN C 
K d and a positive definite matrix V € R dxd such that 

Z n (6 + A) = Z n (9 ) - 2n- 1 / 2 I<A + AVA + R n (A) (16) 

with 

sup n » in 1 > as n — > oo , o — > , (17) 

||A||<d ||A|| 2 +n-i 



as well as 



W n -^N(0,T). 
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// 9 n is a consistent estimator of 9 and 9 n is an approximate minimizer of Z n , 
i.e. 

\\O n ~ \\ = o P (l) and Z n (6 n ) < inf (Z n {6)) + op^ 1 ) , 

then 

V^(9n - 6 ) = V- l W n + o P {\) . 

Proof. The proof is straightfor ward and simi lar to the case when the second 
derivatives exist. For details see iBovsenl (|2006l ) Theorem 7.12. □ 



A second order expansion for the minimized process To derive an 
expansion of type (|16[) for the problem in §§§ , let us first introduce some notation. 
For b, b e M fc+1 and r, f € T k set 

fe+i 

9(x,b,T) =^2bj^l [Tj _ UTj) (x) . 

and 

Z n (b, f) = - 22 (g[xi, b, t) + £i - g(xi, b, f) J . (18) 

i=l 

Assume that / and the estimate /„ as defined by ([9]) are given by 

fe+i fc+i 
f( x ) = y^. b i® 1 \n-i,T i )(%) and fn(x) =^2b i ^l [f ._ u f i) (x) , 

i=l i=l 

respectively. By definition of Z n (b,f) it is clear that 

Z n (b,T)< min Z„(6,f)+o(n- 1 ). (19) 

(6,f)6[-i?,J?.] fc + 1 xr fc 

To obtain an expansion for Z n (b,f), first examine the difference g(x,b,r) — 
g(x,b,f). 



Lemma 4.12. Suppose Assumption]^ is satisfied and v(x) is given by llfy 
Define A by 



A = (bi - bi,n - Ti,b 2 - b 2 ,f 2 -T 2 ,...,fk- T kl b k+ i - b k+ i) . (20) 
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Then 

g(x,b,r) - g(x,b,f) 
fc+i 

= J2 b i m lT i -i,T j ]{x) - bj$l [f ._ ufj] (x) 

3=1 

fe 

= -AV(x) + 0(||A|| 2 ) + J2 °(W T - r\\)Mx-r^-f t ]nJW^ ■ 

i=i 

Note that [x — Tj, x — fj] n J(4>) ^ means that <j> has a discontinuity in the 
interval with endpoints x — Ti and x — f j. 

Proof of Lemma \4.12\ Remember ^J^(cj)) < oo and ||0||oo < oo. 

First assume that fj > Tj and (/> is continuous on [x — fj, x — tj], i.e. J{4 1 ) H 
[x — Tj , a; — Tj] = 0. Then for all y G [x — fj , x — Tj] we have <p(x — y) — (f>(x — Tj) = 
0(\y - Tj\). This leads to 



$1 [r^ 1 ,r J )W - ^l[ T] _ ufj ){x) = - I (f){x-y)dy 

= -(Tj ~ T j)4>{x - Tj) - / ((f>(x -y)- 4>(x - T 3 ))dy 



= ( t j - ty<f>& - t j) - / \y- t j \ d v 

J Tj 

= fa - fi)0(a; - Tj) + 0{{t 3 - f 3 ) 2 ) . 
If cj) has a discontinuity in [x —fj,x — Tj], then 

$l[^_ 1)T3 .)(ar)-$l [T ._ lif .)(x) = (Tj-f j )4>(x-T j )+ f OiUW^dy 

= ( T 3 - fj)(t>(x ~Tj) + 0(\Tj - fj\) . 

The same holds for fj < Tj. Note that l[ x _ TjX _f^ n j^^ is one if and only if 
4> has a discontinuity in [x — fj, x — Tj]. Consequently, 

Sl^r^aO - ^[r } ^fj)(x) = [Tj - fj)4>{x - Tj) + 0({ T j - fj) 2 ) + 

°(\ T 3 ~ TjDllx-T^x-f^nJi^H ■ 
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Similarly, 

$1 [r J _ 1 ,T j )(aj) - ^ft-i^O) = (fj-l - Tj^i)(f>(x - Tj-l) + 

°(( T i-i ~ ^-l) 2 ) + °(\ T j-i - i 1 j-i|) 1 [x-T J _ 1: x-f J _ 1 ]nj(0)/0 ■ 

Remember tq = tq and Tfe +1 = fk+i, combine the preceding results to obtain 
fe+i 

Yl ( b ] m ir 3 -l,r 3 ]( X ) - h^Mfi-uf^)) 
.7=1 
fe+1 

= J2 _ ^') $1 [r,-x,r j ]W +5 i ($l[ T ,_ lirj] (a:) - $l [Ti _ Iifj] (a!)) + 

J'=l 

( 'I' 1 - .- (x) «!>1 - ; .- ;.r:)) 

fe+1 

= (to - &i)* 1 [T J _i.T i ](a!) + &ito - - ri) + 0((r, - f 3 ) 2 ) + 

i=i 

- ^Dilx-r^x-f.inj^)^ + &j(tj-i - Tj_i)0(a; - r,-_i)+ 
°(to-l - ^-l) 2 ) + °(I T J-1 - ^-l|)l[a;-T J _i,a-fj-i]n l 7( ( # 1 )/0j ■ 
By &j(rj — fj) = 6j(Tj — fj) + 0(\\b — b\\ \\t — f\\), this gives 
g(x,b,r) -g(x,b,f) = 

fc+l k 

J2(bj - bj)$l [Tj _ urj] (x) + 53(t 3 - - 73) (&j - 6 i+ i)^(a; - r,)+ 
0(\\r - f || 2 ) + 0(||6 - 6|| ||t - f ||) + 0(||A||)l [x _ T< , x _ fi]nW) ^ . 

3=1 

Since 0(||6 - 6|| ||r - f ||) = 0(||A|| 2 ) this proves the claim. □ 

Lemma 4.13. Suppose the Assumptions\^ and[0 are met. Then the process 
Z n (b,f) allows an expansion of type il6}), namely 

Z n (b, f) = Z n (b, r) + 2n- 1 ' 2 W t n /\ + AVA + R n (A) , 

where R n satisfies condition J_?7| ), A is given by and V is the (2k + 1) x 
(2k + 1) matrix defined by H3\) . Moreover 

Wn ^7v(0,E(e 2 )y). 
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Before we give the proof, we need the following result on the number of design 
points contained in a sequence of intervals. 

Lemma 4.14. If the design points X\, . . . ,x n satisfy Assumption{Q then for 
any two sequences a n ,b n ,n € N with < a n < b n < 1 we have 



n- 1 



: Xi £ [a n ,b n ]}) = P (\b n -a n \ +n 1 / 2 ). 



Proof. The proof is straightforward using that H(x) = J Q h(y)dy is strictly 
monotone, and that by Assumption[CliJ~ 1 f^/n— SA = x^ with maxj = i.... i7l | Si\ = 
Orin- 1 / 2 ). □ 

Proof of Lemma \4-13\ Expand (fl8|) to obtain 
2 n 

Z n (b,f) = - ^e t (g(xi,b,T) - g(xi,b,r)J 

i=l 
1 n 

-Y] \ 9(xi,b,T) - g{x l ,b,- 
n » — ' V 



(21) 



n ■ 

»=l 



\e\\l 



Note that the last term equals Z n (b,T). We will first estimate the second term 
of (|2"Tj) . Denote the points of discontinuity of <j> by J(4>) = . . . ,i9#j(d>)} 
with i?i < $2 < ■ ■ ■ < fiftju)- This means 

[x-n,x-n]n J{4>) ^ o 3s : x e [tf s - n, ?? s - f<] . 

By Lemma f4.141 

#{i : x t e [& s - Tj,<& a - fj]} = Op{n\ Tj -Tj\+ n 1/2 ) . 

This gives 

= P (||A||+n- 1 / 2 ) 



n k #JW rri ,\ k 

— — « * — « > - - " ■ • n * — « 

i— 1 j—1 s— 1 j= 1 
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The functions Vj (x) are piecewise Lipschitz continuous by part (iv) of Lemma 
14.11 With the help of Lemma 14.21 this gives 

n 2fc+l 



n 'r-f n 

i— 1 i— 1 j,r 



^Z)( A *K^)) 2 = S A j A ^(^W(^) 

i— 1 j,r 
!fc+l .1 

53 AjA r / i/j(x)f r (x)h(x)dx + op(l) 



.?!>" 



= A t FA + o P (l). 
Use Lemma 14.121 and the results above to obtain 
1 " 

- z2(9(xi, b, t) - g(o;i, b, f)f 



i— 1 



^(A^O + OdlAin + OdlAH)^ £ Ip.-r,,*.-^: 

i=l 1 s— 1 

1 " 

- J] (AVfc) + 0(|| A|| 2 )) 2 + 0(|| A|| 2 )0 P (j| A|| + n-W) 



n 

i=i 



= AVA + Op(||A|| 3 ) + p(||A|| 2 ), 

where V is given by (|13|) . The remainder terms clearly satisfy condition (|17[) . 
Next, examine the first term of (|2"Tj) . Set 



W„=n- 1 / 2 53£ i i/(a; i ) 

i=l 

to derive 



1 ™ 

Ee l (g(x i ,6,r) - g{x l ,b,f)) 



2—1 



k #J(<j>) 



= -E|( At ^) + 0(||A|| 2 ) + 0(||A||)53 ^ l[*.-T„*.-*,]fo) 

i=l j=l s=l 

A*W n Q(||A|| 2 ) " Q(||A||) " 

v i— 1 i— 1 j — 1 s — 1 

The second term is clearly op(||A|| 2 ). 

To obtain an upper bound for the third term suppose i? s — Tj < d s — fj . Set 

il(s,j) = min{j : >0 S - Tj] and i u (s,j) = max{i : <6 S - fj} . 
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Consequently, 

i=1 i=il(s,j) 

By the law of the iterated logarithm for £1,62,... i.i.d. with E(ei) = and 
E(e 2 ) < 00 we have 

j 

lim max (E(el)k n loglog fc„)~ 1/2 V e l = 1 
fcn-»oo je{i,.-,Jen} I rr^ I 

almost surely. This implies for <5„ = i u (s,j) — ii(s,j) that 

*u (*,.?) 

max V Si = 0({S n loglog 5 n ) 1/2 ) 

j=l,...,o n I * — * 

i=k(s,j) 

holds almost surely. By Lemma T4. 141 

$n = : $s - Tj < Xfl < i? s - fj} = Op(n|rj - fj\ + \/n) = P (n\\A\\ + y/ri) . 
Consequently, 



\ e i 1 \*.-T i ,*.-f i ]( x i) = °p( V ("II A H + " 1/2 ) loglog(n||A|| + nVa) 

z=l 

The same can be shown for $j — Tj > i?j — Tj. Since J{<^) is a finite set and 
k < 00, it follows that 

n k #J{4>) 



0(l|A||) 



i— 1 j — 1 s — 1 



0{n- l \\A\\)Op(^[{ri^K\\ +n 1 /2)logl g(n||A|| + n 1 / 2 ) 
To verify condition (jTTJ) for this term, note that for ||A|| < ti -1 / 2 , 



(23) 



P (n- 5 /yioglog(ni/2)) = opin- 1 ) , 



and for ||A|| > n" 1 / 2 , 



m = OpdlAH^n-VVloglogH) = op(\\ A|| 2 ) . 

This gives 
1 " 

- V Ei(fl(aj4, 6, r) - 5 (x l7 6, f )) = -n^A'Wn + o P (\\A\\ 2 ) + opin- 1 ) . 

n * — » 



i=l 
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Next, take a closer look at W n . For any a S R 2fe+1 , 

n 2fc+l 

a l W n = J2 £ i( n ~ 1/2 E a i v ^ x 

i=l j=\ 

and by similar calculations as in ([^|) 

n 2/c+l 2 1 n 

E ( n ~ 1/2 E %^(^)) = ± j>Mzi)) a = «v« + o P (i) . 

8=1 j=l i=l 

By the central limit theorem and the Cramer- Wold device, 

W n ^ N(0,a 2 V), 

where a 2 = E(ef ) and V is given by (fT3j) . □ 

Lemma 4.15. Given the Assumptions^ and\B^ the matrix V defined by H13\) 
is positive definite. 

Proof. For any (3 € M 2 *^ 1 

/2/c+l 2 „i 2/c+l 2 

( E h(x)dx > ci / ( /3jZ/,(x)J da;. 

V i=i ' •'o v i=1 y 

Observe that by Assumption IB") the functions v\,... , i^fc+i are linearly inde- 
pendent as functions in L2QO, 1]), since hi — bi + \ ^ for all i = 1, . . . , k. Con- 
sequently, for j3 we have that 



,1 , 2/c+l 2 

(a;) ) dx > 



/ ( E 

J ° 4=1 



and thus /3*F/3 > 0. □ 
4.5. Proof 0/ Theorem PO] 

The proof of the main theorem is now a direct consequence of the results given 



above. Part (v) follows directly from the proof of Lemma [4.131 



Proof of part (i) Corollary 14. 101 implies \\6 - 9 n \\ = o P (l). By relation (33 
and Lemma 14.131 the assumptions of Theorem 14.111 are satisfied. The claim 
follows by application of this theorem. 
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:>>0 



Proof of part (ii) By Lemma T4. 121 



(bi$l{ Ti _ UTi) {x) - bi$l[f._ lA -j(x)) dx 

= f 1 ((6 - eyu{x)) 2 dx + o P (\\e- e\\ 2 ) = o P { n - 1 ) , 

Jo 

since v{x) is bounded. This proves the claim. 



Proof of part (iv) and part (iii) Note that 

fc+i 

11/ -/nil! = ^2( b i ~ ^) 2 (min(T i ,f i ) - max( 



y-n^nibi - h+i) 2 + l Tz <f z {bi+i - U) 2 ) \n - n 

i=l 

Op(n- 1 )0 P (l) + Op(l)0 P (n~^ 2 ) = Op^ 1 / 2 ) . 



This proves part (iv) Part (iii) follows by application of Lemma 14.71 



□ 



4.6. Proof of Theorem HOI 

In this section we analyze the case where the number of jumps is unknown. 

In order to reconstruct the number of jumps correctly, it is helpful to use 
a penalty function which is strictly increasing in the number of jumps. Any 
penalty term, which depends on the number of jumps only, is not a pseudo- 
norm on Too,p, since # l 7(A/) = #i/(/) f° r A ^ 0. Hence, the standard results 
from empirical process theory do not apply. However, it is possible to use similar 
techniques in the proofs. 

The fact that f\ n (approximately) minimizes the penalized L 2 functional, 
implies that for any / £ I^ p, we get that 

- Y\\l + XnUhJ < ||*/ " + AnJ # (/) + O^ 1 ) . 

This gives 

- ®f\\l + 2<$/a„ - $/, -e) n + \\e\\„ + A„ J # (/ A J < 



NU + A„J # (/) + o(n v ), 
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which yields the basic inequality 

II*/a„ - ®f\\l + A„J # (/aJ < 2<*/a b - <&/,£>„ + A„J # (/) + oin- 1 ) . (24) 

Hence, a bound for the term |(3>/a„ — ^/) £ )n|) would allow immediate conclu- 
sions on ||$/a„ - as well as A„J # (/aJ. 

Theorem 4.16. Suppose Assumption [2] ?s 'Tiet and £/ie error satisfies \ (Al)\ 
Assume sup 9gg ||g||n < -R. There exists a constant C depending only on As- 
sumption \(Al)\ such that for all 5 > satisfying 



we have that 



1 n <5 2 
P ( ™P | - E | > *) < O exp ( - ^) 



•see - i=1 
Proof. See Lemma 3.2, page 29 in 



van de Geerl fcoOCj ). 



(25) 



(26) 



□ 



A bound of this type can be obtained from the following exponential inequal- 
ity. 

Lemma 4.17. Suppose Assumptions\^\and\B\are met and the error additionally 
satisfies \(Al)\ 

There exist constants Ci, c 2 > 0, such that for all t > c\n~ x l 2 we have 

\&*f)n\ 



( SU P 



>t) < C2 exp I — 



nt" 



l*/l|n4 /2 (/)(l+l0g(J # (/)/||<i>/||„)H 

Proof. Set Gk,n(&) = {&g ■ g & Tk,n}. By Corollary 14.51 there exists a constant 
C > independent of u,k,R and rt such that 



H{U, Gk-^R^lQn) < Ck(l + log ( — )) 
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Compute 

H l l 2 (u,g k ^ R {$),Q n )du 



< ^Ckl Wiog(^^*)d« 



<5 5 
eRk , , I eRk 



= eRkVCk I yj- \og(u)du < eRkVCk / (-log(u))tfa 
5 , ,8 



eRk- 



^{-eRk^ 1 bg (ifljfc))) = 5V ^ (2 + l0g(i?) + l0 ^ kS ~^ 



< C 1( 5Vfc(l + log (| V 1)) = C 1( 5\/fc(l + log (|) + ) , 

where Ci is some finite constant independent of k and (5. By Theorem 14 . 1 61 there 
exists some constant C 2 depending on the subgaussian error condition |(Al)| only, 
such that 

V^P > C 2 ( J H 1 ' 2 (u, Qk-iA^l Qn)du V <$) 

implies 

2 

P ( sup I (g, e) n \ > p) <C 2 exp ( - -5-=) , 

where ^" ) lvR ($,(5) = {.9 e Sfc-i,i?($) : ||ff||„ < S}. Consequently, for all t > 
C^Cin -1 / 2 we have that 



P( sup \{g,e) n \ >*<Vfc(l + log( 7 ) )) < 



rrf 2 /fc(l + log(-f 



C 2 exp( _ j. 



'2 

We arrive at 

P( sup > t ) 

y geg k -i, R (t>) ||5|| n Vfe(l + log(fc/||sf|| n ) + ) ' 

OO , 

< E P ( SU P K £ '5>«l > t(2-iJ)Vfc(l + (log + 

^ ^ 2 nfc(l + (log(fc/r 
^C 2 exp( 



< 



s=l 

■DC 



2 



< V C 2 exp ^^( 1 + ( sl °g(2)-log(fl))+) \ 



S = l 
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Splitting this sum at sr :— + log(-R))/ log(2)] gives 

P( sup ri l(£ ' g) "' ->t) 

-l + log(-R)-, /-i 2 n\ ^ /-i 2 nC 3 (l + slog(2)) 



io g (2) v ci v 1 a e„" exPv <3 



* ^ V-^ /-t 2 nC 4 (l + s) 



< C 5 exp (-^) + X) ^ exp ( 



2., 



2 ^ Cf 



/-i 2 n\ f-t 2 nC 4 \ f°° _ /-« 2 nC 4 s\ 

< C 5 exp j + exp C 2 exp (— ^— j 

- „ f-t 2 n\ Co (-t 2 nC 4 \ ^ _ / t 2 n\ 

< C 5 exp (_ ) + exp j < C 6 exp ( - _ j . 



-i 2 n\ Ci f-t z nCi\ „ ( t 2 n- 

t) + c^ exp ^ Ce exp l - ci 

Here C^Ci, 65,66 are constants depending on C%,C2 and i? only. The last 



inequality holds by t 2 n > C 2 C 2 . 

Since the constant Cq does not depend on k, the exponential inequality also 
holds if we additionally take the supremum over all k. This proves the claim. □ 

The above lemma yields upper bounds for the rate of | (<!>/, which are 
stated in the subsequent corollary. 

Corollary 4.18. Suppose the prerequisites of Lemma \4.17\ are met. Then 



sup |($/,e) J J = ||$/|| nV ^Uf)(l + log(J # (/)/||<i>/||„) + )Op(n- 1 / 2 ). 

Moreover, for each e > we have 

sup |<*/, e ) n | = \W\\ n - e (JM)) {1+2e)/2 P {n-^ 2 ). 

Proof. The first equation follows directly from Lemma f4. 171 To show the second 
equation, observe that J # (/) > 1 and that \fx{l + log(a;)) < cx 1 / 2+e for x > 1, 
e > and c > (e _1 V 1) Moreover, if c is large enough and x > then x(l + 
log(x -1 )) < ca; 1_c . Combine these observations to derive the second equation 
from the first. □ 

Now we are in the position to prove that with probability one the penalized 
estimator f\ n correctly estimates the number of jumps as n tends to infinity 
(given a proper choice of the penalty term). 
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Proof of Theorem EOl Application of Corollary [4. 181 to (|24|) gives 

u$A„ - ml < - */n;rv # (A„ - /) i / 2+e op(n- i /2 ) 

(27) 

+ A„(J # (/)- J # (/0)+o(0. 
where e is given by the condition A„n 1 ^ 1+e ' ) — > oo. 

First, assume J#(f\„) < •/#(/)■ Then J#(f\ n — f) is bounded and l[2"7j) implies 
that either 

11*/^ = 0(^+0(0 or ||*A„-$/||i +e = 1 ,(n- 1 / a ). 

Thus, ||$/ An -$/|| n = op(l).ByLemmaB21 this implies ||$A„-$/|| 2 = o P (l). 
With the help of Lemma [4771 it follows d(J^(fx„), J{f)) = op(l), which in turn 
implies J#{f\ n ) > J#{f) eventually. 

Now assume J # (A„) > Then yields 

||<&A„ - */||* < ||$A„ - */lir e J#(A„ - /) 1/2+e O P (n- 1 /2) + o(n -i) . 

Assume nu is a subsequence such that ||$A n ~ ^/Ht^T ^ cn fc ^ 2 f° r some 
c > 0. Dividing the last equation by ||$A« ~ ^/lln^ 6 gi ves 

PA„> -1>/ll£ e < J # (h nk ~f) 1/2+ *0 P {n- k 1/2 ) + o{n- k 1/2 ) 

= J#(A„ fc -/) 1/2+e OpK 1/2 ). 

This yields 
Moreover, by ([27)) 

An fc (J # (A„J- J#(/))< 

P (n? /2 )\\*f Kk -QfWlTMK -f) 1/2+e + o(n^). 
Combine the last two equations to obtain 

KMfxJ - ■/#(/)) < OpKT 1/(1+e V#(A„ fc - f)^-^'^ . (28) 
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Now assume nk is a subsequence such that ||3>/a„ — ^/lln^ 6 < cn k 1 f° r somc 
c > 0. Application of Corollary 14. 181 to (f24"| and the observation that J # {g) > 1 
for all g gives 

An fc (J # (/A„J- ■/#(/)) 

< OpKT 1/2 )||$A„ fc -$/||£ e -/) 1/2+e + «K 1 ) 

As each sequence can be decomposed into a subsequence containing only ele- 
ments smaller than cn -1 / 2 and a subsequence containing only elements greater 
or equal to en" 1 / 2 for some c > 0, we have shown that J#(f\ n ) > J#(f) im- 
plies flUJ) . 

Now we show that J#(f\ n ) — J#(f) —> in probability. To this end, assume 
there exists some subsequence rik such that 

4(/g-4(/)>c>0. (29) 
This implies J # (/) < JAfKHMhj ~ ■/#(/)) and 

J # (A„ fc - /) < 2(J # (/ Anjfc ) - J # (/)) + 2J # (/) 

< (2 + 2J # (/)c- 1 )(J # (/ A „J- J # (/)) 

= o(i)(j # (A„j- j # (/)). 

Hence 

Op(n fe - 1/(1+e) )J*(/A„ fe -/) (1+e " e2)/(1+e) = 
Together with (|28|) . the assumption ^n k n\! — > oo and p9|) . this gives 

o < c < 2 /^> < (j # (A„j - j # (/)) e2/(1+e) = Op^tXt 1 ^) = op(i) , 

which is a contradiction and implies J # (/„) — J # {f) — > in probability. Since 
</#(/) and J # (/ n ) are integers, this yields 

P(J # (/n) = J#(/)) 
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for n — > oo. This proves the claim. 



36 
□ 



5. Proof of Theorem HH] 



To give the proof of Theorem l2.ll part (i) we will define the native Hilbert space 



Af<f, of a positive definite function <f> and show that the elements of its dual space 
Sx{f ) = f(x) and Px,y(f ) = f(t)dt are linearly independent, if <p has certain 
properties. Then we will deduce that the functions A^(-, tq,t\), . . . , A^(a;, r^, Tfe+i) 
are linearly independent. 

The assumptions 4>{x) > and {§]) imply that the Fourier transform is 
strictly positive. This means that <p is positive definite. (For a definition and 
characterization of real- valued positive definite functions, compare Chapter 6 in 



Wendlandl (|2005lU 

For a positive definite function <j> and £1 c K let A/^Sl) denote the unique 
Hilbert space (H, (-, -)n) of functions / : CI — > M satisfying /(x) = (/, 0(a;— -))u- 
J\[<j,{Vi) is called native space for <f> and given by the closure of the span of the 
function set {4>{% — ■) '■ x € fl} under the inner product induced by {4>{x — 
•), (j)(y — ■)) = 4>(x — y). A short introducti on to native spa ces along with some 



Schabackl dl999h . 



basic results of the theory can be found in 
Denote by 

5(R) = {/ E C°°(M,C) : lim |x"7(" l )(a;)| = for all n, m = 0, 1, 2, . . . } 

\x\ — y oo 

the Schwartz space, where C°°(M, C) is the set of smooth functions from M. to C. 
The first result is, that the native space Af^(iY) contains all Schwartz functions 
which are compactly supported in CI. 

Lemma 5.1. Assume Cl C R and </> satisfies the conditions given by Theo- 



rem \2.1[ part (i) Then all real Schwartz functions with support contained in fi 
are elements of the native space N$(Cl), this means that 

{/ G 5(M) : supp(/) C ft} C A^(ft) . 
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Proof. We first proof the claim for fi = R. Assume / e <S(R). Since Fourier 
transformation is a bijection from 5(R) to 5(R) / and f 2 are also Schwartz 
functions. Hence for any rig € N, we can find a constant c\ > such that 
I f(x) | 2 < ci(l + |x| n ° +2 ) -1 . By © there exist c 2 > and n E N such that 
(^(a;)) -1 < c 2 (l + |x| n °). We arrive at 



l/(s)| 5 

K (j)(x) 



dx < C\C2 



By Theorem 10.12 of 



Wendlandl (|2005T ) the function / is in A/^(R) if and only if 



1 + |x|"o+ 2 



dx < oo . 



\f{x)\ 2 /4>{x)dx < oo . 



This proves the claim for fi = R. 

Now assume f2 C R is arbitrary an d f € 



with supp/ C fi. We have 



shown / e A^(R). By Theorem 10.47 in lWendlandl (|200a ) for f2 C R, / € A/^(R) 
implies /|q € A/^(f2). This proves the claim. □ 



Note that Lemma 15.11 implies that for any interval (a, b) C fl there exists 
some test function ip £ A/^(0) satisfying supp(?/>) = [a, b}. One example is 

i>(x) = l(a,b){x) exp((x - a)^ 1 + (b - x)^ 1 ) . 

This observation can be used to show that point evaluation and integral mean 
are linearly independent as elements of the dual space of J\f$(n). 

Definition 5.2. For 7 G R and 71, 72 € lU {—00, 00} with 71 < 72 define the 
point evaluation functional S 7 (f) = f(p/) and the functional /0 7l , 72 

by 



Pjl,f2 (/) ' — 




f(x)dx 71 7^ 72 , 



/(7i) 



7i = 72 • 



Lemma 5.3. Suppose <f> satisfies the conditions given by Theorem \2.1\ part (i) 
Assume tq < ... < Tk+i, 71 < ... < j r and there exist an e > such that 
(ti — e,Tfc + e) a Q as well as (71 — e,j r + e) C O. T/ien £/ie functionals 
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Pto,ti t Pn,T2) • • • ) Pr k .T k+1 j ^7i! • • • j S lr ar e linearly independent as elements of the 
dual space A/^(fi)'. 

Proof. Assume 

k+1 r 

E«iPr«-i,r 4 (/)+X;^7i(/)=0 
i=l j=l 

for all / £ For each i = 1, . . . , k + 1 we can find an interval Ji C 

[Tj_i,Tj] H 51 such that Ji fl 7j =0 for all .7 = 1, . . . ,r. By Lemma [5.11 we can 
find a test function /, <E J\f^,(fl) with supp(/j) C Ji and J R fi(x)dx = 1 for all 
i = 1, ...,k + 1. We then have that PTi-i,n(fi) = li=( an d fi-yAfi) — f° r au 
i = X, . . . , k + 1 and j = 1, . . . , r. This leads to 

k+1 r 

= H a lpTi-un (fi) + PjS-fj ifi) = a i 
1=1 j=x 

for all i = 1, . . . , k + 1. Similarly we can find test functions fj G A/^(f2) with 
d-yj(fi) — l*=j an d deduce that /3j = for all j = l,...,r. This proves the 
claim. □ 



Finally, we can prove Theorem 12.1) part (i) 



Proof of Theorem \2.1[ part (i) Assume 



^OiA^-.T-i-i.T-i) =0. (30) 
«=i 

By continuity of <j>, A^(x, 7j_i, Tj) and hence X^^ 1 ctiA^x, Ti_i, r,-) are contin- 
uous functions of x. Consequently, (|30[) implies 



fc+i 

= y^ctjA^(a:,Tt-i,Tj) , 
i=i 

for all x € [0, 1]. By definition of A (see ©) 

fc+i fe+i 
= 22a i A^(x,T i --i,T i ) = y]ajp Tt _ UTi (^(x - ■)) , 
i=i »=i 
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for all x S [0, 1]. Set tt = [0, 1]. By Theorem 8 in ISchabackl (|1999f L the native 
space A/^O) is the closure of the span of the set of functions {4>(x — ■) : x € fi}. 
It follows that 

fc+i 

= ^2a l p Ti _ 1 ^(f) 

i=l 

for all / € A/"^(n). By Lemma 15.31 we know that p To ,ri, ■ ■ ■ , Pr k .T k+1 are linearly 
independent as elements of the dual space Af^(n)' . Consequently, on = for all 
i = 1, . . . , k + 1, which proves the claim. □ 

6. A lower bound for estimating the jump locations 

In this section we show that the obtained rate &{J{fn),J(f)) = Op(n -1 / 2 ) is 
optimal in a minimax sense. To do so, we construct functions /o, fi,n> /2,n with 
d( l 7(/o), J{fi, n )) = cn^ 1 / 2 for i = 1, 2 and some c > to be chosen later. Given 
the observations 

Y. l =g(x l )+e i i = l,...,n 

for g G {$/o, 3>/i,n, ^i^.n} and £i,...,e„ independent and identically dis- 
tributed according to ^V(0, <r 2 ) with a 2 > 0, we show that for any estimator, the 
probability to choose the true function is strictly smaller than one. Obviously 
it is sufficient to consider the case of a single jump with a fixed jump height. 

Lemma 6.1. Suppose Assumption\B\ is met, Xi, . . . ,x n £ [0,1] are arbitrary 
fixed design points. Moreover, assume that E\, . . . ,e n are independent and iden- 
tically distributed according to N(0,a 2 ) with a 2 > 0. Set g T = $l[ Tj0 o) forr G K. 
Given observations 

Yi=g T (xi)+Ei i = l,...,n 

denote the corresponding probability measure by P T . There exist constants c, ci > 
such that 

infsupP T (|r-f| > cn- 1 ' 2 ) > c x > . 
For the proof we need the following theorem. 
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Theorem 6.2. Suppose M > 2 and 9 contains elements 0q, 6%, . . . , 6m with 

d(6j,9 k ) > 2s > 0, V0<j<k<M, 
Pg j <C Pe for all j = 1, . . . , M and 

M 

M 



10 



1 M 

-J2M p e j ,Pe )<alogM, 



3 = 1 

with < a < 1/10, where dx denotes the Kullback-Leibler distance. Then 



inf sup P e (d(6, 6) > s) > 
» see 1 + VM V 



1 - 2a - 2 



logM 



> 0. 



Proof. See Theorem 2.5, page 85 in 



Tsvbakovl (|2004h 



□ 



Proof of Lemma \6.1[ Recall that for product measures P — <8>™ =1 -Pi and Q — 
®i=iQi we have 

n 

dK(P,Q) = Y,d K (P i ,Q i ). 

i=l 

Note that Yi ~ N(g T (xi),a 2 ) and denote the corresponding measures with P\. 
By independence of the Ei the joint measure P T of Y\, . . . , Y n is given by P T — 
®" =1 P;. This yields 

n 

d K (P Tl ,P T2 ) = (2(7 2 )- 1 ^( 5Tl (^)-5r 2 (^)) 2 . 

i=l 

By Assumption [B] the integral kernel <^ is bounded in sup- norm. Calculate 



Consequently, 



x i - y) d y) < (n - r 2 ) | 



2 1 1 J.II2 



d K {P Tl ,P T2 ) < {2a 2 )- 1 n{T 1 -T 2 ) 2 U\\l c . 
Now choose some < a < 1/10, set c = (2acr 2 /||(/)|| 2 ) 1 / 2 and choose 



tq e (now + cn 1/2 , T up - cn i/2 ) . 



-l/2\ 



Set 



Ti = tq + cn x l 2 and 7-3 = tq — cn 1 I 2 . 
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This gives 

1 2 2 

-Y J d K {P T] ,Pr )<(^ 2 )- 1 nY J {r Q ~T J fU\\l o = a. 
i=i i=i 

Consequently, the assumptions of Theorem 16.21 are satisfied for s = n~ 1 / 2 c/2 
and d(r, r') = |r — t'\. Application of this theorem gives 



V2 



infsupP r (|r-f| > 2~ 1 cn~ 1/2 ) > J 1 - 2a- 2. ) > 0. 

This proves the claim. □ 

Note that in the proof we used the absolute integrability and the boundedness 
in supremum norm of the integral kernel only. 



Proof of Theorem \3.3[ Lemma 16-11 directly implies that the jump estimator at- 
tains the minimax rate. If / is a step function with known jump locations and 
unknown level heights hi, the inverse regression model (JlJ reduces to a standard 
linear regression model. It is well known that in this setting the levels bi cannot 
be estimated at a rate faster than Op(n -1 / 2 ). Consequently, this also holds for 
the case of unknown jump locations. This proves Theorem 13.31 □ 



Acknowledgments The authors wish to thank L. Dumbgen, T. Hohage, R. 
Schaback and A. Tsybakov for interesting comments and bibliographic informa- 
tion. 

L. Boysen gratefully acknowledges support by Georg Lichtcnberg program 
"Applied Statistics & Empirical Methods" and DFG graduate program 1023 
"Identification in Mathematical Models" ; A. Munk was supported by DFG grant 
"Statistical Inverse Problems under Qualitative Shape Constraints" and DFG 
grant FOR916. 

References 

Birge, L. and Massart, P. (2006). Minimal penalties for gaussian model 
selection. Probab. Theory Related Fields, to appear . 

imsart-generic ver. 2008/01/24 file: ejs_2008_204.tex date: March 14, 2008 



L. Boy sen and A. Munk/ Jumps in inverse regression 42 

BlSSANTZ, N., Dumbgen, L., Holzmann, H. and Munk, A. (2007). Non- 
paramctric confidence bands in deconvolution density estimation. J. Royal 
Statist. Society Ser. B. 69 483-506. 

Boysen, L. (2006). Jump estimation for noisy blurred step function. Ph.D. 
thesis, Georg-August-Universiat Gottingen. 

URL http : //webdoc . sub . gwdg . de/diss/2006/boysen/boysen . pdf 

Butucea, C. and Tsybakov, A. (2007). Sharp optimality for density decon- 
volution with dominating bias. i,ii. Theory of Probability and Its Applications, 
to appear . 

Cavalier, L. and Tsybakov, A. (2002). Sharp adaption for inverse problems 

with random noise. Prob. Theory Rel. Fields 123 323-354. 
Dumbgen, L. and Johns, R. B. (2004). Confidence bands for isotonic median 

curves using sign tests. J. Comput. Graph. Statist. 13 519-533. 
Feder, P. I. (1975). On asymptotic distribution theory in segmented regression 

problems-identified case. Ann. Statist. 3 49-83. 
Goldenshluger, A., Juditsky, A., Tsybakov, A. and Zeevi, A. (2006a). 

Change-point estimation from indirect observations. 1. minimax complexity. 

Preprint. 

Goldenshluger, A., Juditsky, A., Tsybakov, A. and Zeevi, A. (2006b). 
Change-point estimation from indirect observations. 2. adaptiation. Preprint. 

Goldenshluger, A., Tsybakov, A. and Zeevi, A. (2006c). Optimal change- 
point estimation from indirect observations. Ann. Statist. 34 350-372. 

Hinkley, D. V. (1969). Inference about the intersection in two-phase regres- 
sion. Biometrika 56 495-504. 

Karlin, S. and Studden, W. J. (1966). Tchebycheff systems: With appli- 
cations in analysis and statistics. Pure and Applied Mathematics, Vol. XV, 
Interscience Publishers John Wiley & Sons, New York-London-Sydney. 

Koul, H. L., Qian, L. and Surgailis, D. (2003). Asymptotics of M- 
estimators in two-phase linear regression models. Stochastic Process. Appl. 

imsart-generic ver. 2008/01/24 file: ejs_2008_204.tex date: March 14, 2008 



L. Boy sen and A. Munk/ Jumps in inverse regression 43 

103 123-154. 

LAN, Y., Banerjee, M. and Michailidis, G. (2007). Change-point estimation 

under adaptive sampling. Technical Report, Univ. of Michigan . 
Muller, H.-G. (1992). Change-points in nonparametric regression analysis. 

Ann. Statist. 20 737-761. 
Muller, H.-G. and Stadtmuller, U. (1999). Discontinuous versus smooth 

regression. Ann. Statist. 27 299-337. 
Neumann, M. H. (1997). Optimal change-point estimation in inverse problems. 

Scand. J. Statist. 24 503-521. 
Quandt, R. E. (1958). The estimation of the parameters of a linear regression 

system obeying two separate regimes. J. Amer. Statist. Assoc. 53 873-880. 
Quandt, R. E. (1960). Tests of the hypothesis that a linear regression system 

obeys two separate regimes. J. Amer. Statist. Assoc. 55 324-330. 
Raimondo, M. (1998). Minimax estimation of sharp change points. Ann. 

Statist. 26 1379-1397. 
Romer, W., Lam, Y. H., Fischer, D., Watts, A., Fischer, W. B., 

Goring, P., Wehrspohn, R. B., Gosele, U. and Steinem, C. (2004). 

Channel activity of a viral transmembrane peptide in micro-blms. J. Am. 

Chem. Soc. 49 16267-16274. 
Roths, T., Maier, D., Friedrich, C., Marth, M. and Honerkamp, J. 

(2000). Determination of the relaxation time spectrum from dynamic moduli 

using an edge preserving regularization method. Rheol. Acta 39 163-173. 
Sacks, J. and Ylvisaker, D. (1970). Designs for regression problems with 

correlated errors. III. Ann. Math. Statist. 41 2057-2074. 
Schaback, R. (1999). Native Hilbert spaces for radial basis functions. I. In 

New developments in approximation theory (Dortmund, 1998), vol. 132 of 

Internat. Ser. Numer. Math. Birkhauser, Basel, 255-282. 
Schmitt, E. K., Vrouenraets, M. and Steinem, C. (2006). Channel activ- 
ity of ompf monitored in nano-blms. Biophys. J. 91 2163-2171. 



imsart-generic ver. 2008/01/24 file: ejs_2008_204.tex date: March 14, 2008 



L. Boy sen and A. Munk/ Jumps in inverse regression 44 

Sprent, P. (1961). Some hypotheses concerning two phase regression lines. 

Biometrics 17 634-645. 
TSYBAKOV, A. B. (2004). Introduction a V estimation non-par ametrique, vol. 41 

of Mathematiques & Applications (Berlin) [Mathematics & Applications]. 

Springer- Verlag, Berlin. 
VAN DE Geer, S. A. (1988). Regression analysis and empirical processes, vol. 45 

of CWI Tract. Stichting Mathematisch Centrum Centrum voor Wiskunde en 

Informatica, Amsterdam. 
van de Geer, S. A. (2000). Applications of empirical process theory, vol. 6 

of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge 

University Press, Cambridge. 
van der Vaart, A. W. (1998). Asymptotic statistics, vol. 3 of Cambridge 

Series in Statistical and Probabilistic Mathematics. Cambridge University 

Press, Cambridge. 

Wendland, H. (2005). Scattered data approximation, vol. 17 of Cambridge 
Monographs on Applied and Computational Mathematics. Cambridge Uni- 
versity Press, Cambridge. 

Yakir, B., Krieger, A. M. and Pollak, M. (1999). Detecting a change in 
regression: first-order optimality. Ann. Statist. 27 1896-1913. 

Yao, Y.-C. and Au, S. T. (1989). Least-squares estimation of a step function. 
Sankhya Ser. A 51 370-381. 



imsart-generic ver. 2008/01/24 file: ejs_2008_204.tex date: March 14, 2008 



